Ontology Based Framework for Web Page Information Extraction
نویسندگان
چکیده
Nature of Web information is dynamic and irregular that’s why it is difficult to search and integrate information from the Web. The biggest task in making WWW data accessible to users/agents is extracting the data from Web pages. We take advantage of information in existing Web pages to creating structured data semi-automatically. Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontology may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. . This paper proposes an ontology-based information extraction system and its application to online book store domain. Testing result shows that this algorithm doesn’t rely on the page structure and it can increase the recall and precision of information extraction.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملSemantic Extraction from List Web Pages
Extracting structured information from web pages is a problem that has many applications and that gained increased interest in recent years. We propose an approach that can achieve extraction and semantic description of data contained in a list web page. Our approach is fully automatic and is based on a "seed" ontology that contains minimal information about the domain. It uses an instance-base...
متن کاملAn Ontology-Based Extraction Framework for a Semantic Web Application
The Semantic Web vision is rapidly becoming a mainstream reality, but obstacles remain in the way. A major challenge is the adoption of practical Semantic Web applications and the production of vast stores of ubiquitous meta-data which is needed to allow robust inference engines to attain the goals of machine readability of web documents. The authors propose the Semantic Web Applications (SEMWA...
متن کاملWpps: a Novel and Comprehensive Framework for Web Page Understanding and Information Extraction
In this paper, we present WPPS, a new, highly configurable Java-based framework for developing efficient and robust methods that address problems in the fields of web page understanding and information extraction. Furthermore, we introduce the representation of a web page as a unified ontological model (UOM), describing its different aspects such as layout, visual features, interface, DOM tree,...
متن کامل